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Debit  Credit  Council 


Renamed: 

Transaction  Processing  Performance  Council 
Benchmark:  TPC  Benchmark  A™ 
Members: 

ATT.Biin,  CDC, Computer  Associates,  Cullinet,  DG,  DEC,  Fujitsu, 
HP,  HB,  IBM,  ICL,  Informix,  NCR,  Oracle,  Prime,  Pyramid,  RTI, 
Sequent,  Sequoia,  Stratus,  Sun,  Sybase,  Tandem,  Teradata, 
Tolerant,  Unisys,  Wang 

Harder: 

Measure  response  time  at  driver  system 
Reply  must  return  new  balance 
Easier 

Shrink  terminal  net  by  10X 
Eliminate  Presentation  Services 
Shrink  history  file  by  3x 

Response  time:  90%  @  2  seconds  (vs  95%  @  1  sec) 

BIG  DEBATE: 

How  to  characterize  the  network? 
LAN? 
WAN? 

Contact:  Omri  Serlin, 

ITOM  International 

POB  1450 

Los  Altos,  CA  94022 

415-948-4516 
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DISC  ECONOMICS  /  TRENDS 
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Hoagland:    Disc  Magnetic  Areal  Density  (MAD)    =10  Mb/in- 


Moore: 


RAM  Memory  Density 


(year-1970)/5 

10  Kb/chip 


Disc  ~  5$/MB- 20S/MB  .l$/access  -  4k$/access 

RAM:         100$/MB-5k$/MB  ??????? 


Next  Decade:  Disc  &  Controller  -100$     ~1GB  =>  .1$/MB 

RAM  Wafer:         ~1K$      ~.5GB  =>  1$/MB 
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DISC  ECONONOMICS  TODAY 
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Someday:       Disc  will  be  "tape" 

Cheap  archive  sequential  storage, 

NOT  Random  Block  Access  Storage  Device 

Today:  5  minute  rule  applies: 

keep  it  in  ram  if  accessed  every  5  minutes 

J.  Gray  ,  F.  Putzolu,  The  5  Minute  Rule  for  Trading  Memory  for  Disc 
Accesses,  and  the  10  Byte  Rule  for  Trading  Memory  for  CPU  Instructions, 
ACM  SIGMOD  Proceedings,  June  1987, 

THE  BIG  DISC  PROBLEM:  Disc  Delivers  25accesses/second: 

100MB  1  a/s/4MB, 

1GB  1  a/s/40MB 

100GB  1  a/s/4GB 
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EVEN  TODAY,  DISC  NEEDS  TO  BE  USED 
SEQUENTIALLY 


DISC  SPEED  vs  TIME 

100MT 
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SEQUENTIAL  B/SEC 


100K"       RANDOM  B/SEC 


RANDOM  ACCESSES/ SEC 


10K- 

1960  1970  1980  1990  2000 

YEAR 

1.  ACCESS  RATE  NOT  MUCH  IMPROVED 

2.  SEQUENTIAL  100X  RANDOM 

SO:  USE  SEQUENTIAL  "DISC  IS  TAPE!" 
LARGE  BLOCK  TRANSFERS 
CONVERT  RANDOM  10  TO  LOG  10 
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Laws  of  Nature 


i 


Arm 


Discs  rotate  at  60rps  (  1800  ->  2400  ->  3600) 
=>  60  io/ sec  max  (50  due  to  creep) 
=>  ~  16ms /rotation 
May  rise  in  future 

Service_time  =  Seek  +  Settle  +  Rotate  +  Transfer 

Settle  ~  2ms 

Rotate  ~  1/2  (16ms)  ~  8ms 
Work  on  Seek  &  Transfer 

Gray  Discs  2/2/89  9 


SEEK  TIME 


Seek_time   ~  V distance 

because:        1:  constant  acceleration 

Velocity  vs  Time  @  Constant  Acceleration 


Time 

2.  area  under  curve  (distance)  ~  time^ 
Expected  seek  distance: 

If  random  access,  then  -  of  total  tracks 

3 

(difference  of  two  random  variables). 

Trends: 

As  discs  get  smaller  14*'->  9"->8"->5in->3i": 
seek  distance  decreases  (linear) 


seek  time  decresases  V  stroke 
3  

arms  are  a/ lighter  =>  faster  acceletation 
less  power,  stress  =>  reliable  and  cheap 

mad  decrease  implies  less  seek  needed:  2^Jq  „  3x/decade 
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TRANSFER  TIME 


Transfer_time     ~  bytes/bandwidth 

Typical  Bandwidth:     1  MB  /  s  ...  1 OMB  /  s 

Bandwidth  ~     Rotations/sec  *  Bytes/track 

but  Rotations/ sec  ~  60  is  a  universal  constant  so 

Bytes/track 

(Bytes /inch)  *  (inches /track) 
~     Vmad~*  Diameter 
Trend:  Discs  are  shirinking    14"->  9"->8"->5i"->3i": 

=>  Diameter  is  shrinking  (3x  in  this  decade) 

Perhaps  this  will  end. 
=>  VMAD  decreases  ~3/decade 
Net:  zero  change  in  bandwidth 


"Solution":  Parallel  read  from  multiple  heads 
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FORMATTING 


Disc  Track  formatted  into  Blocks  or  Sectors  (512  is  typical) 

Separated  by  Gaps 
Gaps  are  fixed  by       switching  times, 

speed  of  light  to  controler/ cpu 
As  density  increases,  gaps  dominate  space. 
At  present  25%  gap,  75%  data  is  typical. 

=>  Formatted  capacity      ~  .75  rated  capacity 
=>  Data  Bandwidth  -.75  rated  bandwidth 

"Solution":  Bigger  blocks       4KB  =>  8x  fewer  blocks 

97%  used  space 
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SUMMARY  OF  DISC  PHYSICS 

Service  time  =  Seek  +  Settle  +  Rotate  +  Transfer 


Work  on  Queue,  Seek  &  Transfer 

transfer 


rotate 
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HOW  TO  DESIGN 
A  DISC 
SUBSYSTEM 


CHANNEL 


DISC 
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WHERE  DOES  THE  ACCESS  TIME  GO? 


PREDICTED: 


QUEUE 


SEEK  ROTATE 


•30  ms 


 18  ms   +4*8A+~ 


TRANSFER 


J 
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SERVICE  TIME  VS  UTILIZATION 


0  20  40  60  80  100 

UTILIZATION 
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WHERE  DOES  THE  ACCESS  TIME  GO 


PREDICTED: 


QUEUE 

SEEK 

ROTATI 

4  p. 

 ► 

2 

^  ». 

30  ms 

18  ms 

8.4 

TRANSFER 


MEASURED: 


QUEUE 


SEEK 


ROTATE 


TRANSFER 


30  ms 


10  ms 


10  ms    I  io  ms 


R.A.  Scranton  &  D.A.  Thompson,  The  Access  Time 
Myth,  IBM  Research  Report  RC  10197  (#45223)  9/21/83 
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WHERE  DOES  THE  ACCESS  TIME  GO 


PREDICTED: 


QUEUE 

SEEK 

ROTAT1 

: 

*4  ► 

30  ms 

— — — w 

18  ms 
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MEASURED: 


QUEUE 


SEE^S  ROTATE 


TRANSFER 


30  ms 


10  ms 


10  ms 


10  ms 


MOST  SEEKS  ARE  SHORT 

100 


% 


0 

0  250        500  750  1000 

SUGGESTION:  AVOID  ZERO-LENGTH  SEEKS 
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WHERE  DOES  THE  ACCESS  TIME  GO 


PREDICTED: 


QUEUE 


SEEK 


ROTATE 


30  ms 


18  ms 


8.4 


TRANS  FE 


MEASURED: 
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10  ms    |  10  ms  • 

 + 

30  ms 

10%  RPS  MISS  BECAUSE 

CONTROLLER  BUSY 
CHANNEL  BUSY 
CPU  BUSY 

SUGGESTION: 

ABANDON  RPS 

PUT  BUFFER  ON  DISC  CONTROLLER 
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WHERE  DOES  THE  ACCESS  TIME  GO 


PREDICTED: 


QUEUE 

SEEK 

ROTATI 

: 

— * 
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10  ms 

10  ms 

CHANNEL  CONTENTION 

BECAUSE  SLOW  DEVICES 
BAD  PROTOCOLS 


SUGGESTION: 

BUFFER  CHANNEL 

BURST  MULTIPLEX  CHANNEL 
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HOW  TO  DESIGN 
A  DISC 
SUBSYSTEM 


CHANNEL 


TO  AVOID  QUEUEING  WANT  MANY  Arms 

Controllers 
Channels 


Gray  Discs 


2/3/89 


22 


CONTROLLER  PER  DISC  AVOIDS 

QUEUES 


CPU 

& 

MEMORY 


TO  AVOID  QUEUEING  WANT  MANY  ARMS 

Controllers 
Channels 

TO  AVOID  RPS  MISS  and 

TO  ALLOW  BURST  MULTIPLEX  CHANNEL  WANT 

Buffered  Controllers 
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CPU 

& 

MEMORY 


WHERE  TO  PUT 
BUFFERS 


BUFFER  HERE  SAVES 
CHANNEL,  CONTROLLER, . 


CHANNEL 


DISC 
CONTROLLER 


PERJPHERIALS 

(tapes,  printers, 
com  lines, 
(terminals) 


BUFFER  HERE  COSTS 
EXTRA  CHANNEL,  CTLR 
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WHAT  IF  DISC  BUFFER 
MUCH  (10X)  CHEAPER 

4k  PAGE  @  5k$/MB  =>  20$ 
4k  PAGE  @500$/MB=>  2$ 

3k  ins  @  50K$/MIP  =>  150$/ACCESS 
channel  +  controller  @  300  a/s 

=>  500$/A 

BREAK  EVEN  IS  ABOUT  30  SECONDS 
SO  CASE: 

HOT  SPOT  (RI  <  30sec):  MAIN  MEMORY 

WARM  SPOT  (RI  in  [30, 1000]):   DISC  BUFFER 
COLD  SPOT  (RI  >  1000):  DISC 
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MIRRORED  DISCS 


CPU 

CPU 
Memory 

Channel 

Controller 

Memory 

Channel 


DISC 

DISC 

CPU 
Memory 
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Memory 
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Channel 


DUAL  MODULES  (controller,  disc) 
DUAL  DATA  PATHS  (4  paths  to  data) 
READ  ANY,  WRITE  BOTH 

EACH  MODULE  IS  FAIL  FAST  (disc,  controller,  path) 

MTBF  ~MTBF 
2  MTTR 
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DOES  DISC  DUPLEXING  WORK? 


1987  Tandem  :       50,000hr  MTBF  (6  years) 

5hr  MTTR 

=>   ~  65,000  year  MTBF 

OBSERVED  IN  LAST  24  MONTHS: 

35  double  fails  on  -46,400  pair/years 
~  1300  years 

CONCLUSION: 

IT  WORKS  WELL    (200x  better  than  no  duplex). 

FAILURES     NOT  INDEPENDENT 

NOT  UNIFORM 
INVOLVE  CONTROLLERS . . . 
(5 Ox  worse  than  theory) 
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MIRRORED  DISC  PERFORMANCE 


Seek  Time  (ms)  vs  %  reads  for  mirrored  discs  at  low  load  (no  queueing) 
24-  (write  time  is  max  seek) 
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Read  from  closest  arm  =>  seek  tracks 

Write  farthest  arm  =>     seek  tracks 
Mix  gives  curve  above 

Note:  Shortest  service  time  includes  shortest  rotation 

=>  save  an  additional  ^  16  =  ~3ms 
Total  savings  on  mirrored  reads:  ~8ms  (5+3) 
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MIRRORED  DISC  ARM  SCHEDULING 

Assume  FIFO  scheduling  of  requests. 

Write  scheduling  is  no-brainer 

Read  scheduling  could  be:     Shortest  Seek 

Shortest  Service  time 

Other? 

For  low  loads  all  are  about  the  same 

Between  30%  and  90%  Shortest  Service  time  is  best  (-8%) 

Read  only  case: 


Response  Time  improvement  of 
shortest-service-time  over  shortest-seek  vs  utilization 

(cases  where  there  is  <3%  overall  improvement  are  not  shown) 


utilization 


Even  better  for  mixed  reads  and  writes. 

Bitton,  D.,  Gray,  J.,  Disk  Shadowing,  VLDB  1988  Proceedings,  Morgan  Kauffman,  Sept  1988. 
Bitton,  D.,  Arm  Scheduling  in  Shadowed  Disks,  COMPCON  1989,  IEEE  Press,  March  1989. 
Gray,  J.,  H.  Sammer,  S.  Whitford,  Shortest  Seek  vs  Shortest  Service  Time  Scheduling  of  Mirrored  Disc 
Reads,  Tandem  Computers  December  1988 
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WHAT  ABOUT  USING 

ARRAYS  OF  SMALL  DISCS 

SMALL  IS  BEAUTIFUL: 
MASS  PRODUCTION: 
LOW  COST 

DISC  IS  FIELD  REPLACEABLE  UNIT 


PARALLELISM  =>  performance: 

disc  striping  =>  lOx  bamdwidth 


PROBLEM  WITH  STRIPING: 

THE  BIG  DISC  PROBLEM: 

Disc  Delivers  25accesses/second: 

100MB    1  a/s/4MB, 

1GB       1  a/s/40MB 

10GB      1  a/s/400MB 

100GB    1  a/s/4GB 
Arms  are  the  scarce/queueing  resource 
Good  if  DISC  is  treated  as  TAPE:  Purely  Sequential 
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WHAT  ABOUT  USING  SMALL  DISCS 

PROBLEM: 

MANY  SMALL  DISCS  =>  MANY  ERRORS 

SOLUTIONS: 

DUPLEX  Discs,  Controllers,  Paths,  Power,...: 
Good  for  small  read+writes 

RAID  (Redundant  Arrays  of  Independent  Discs) 

N  data  discs  +  parity  disc. 

Good  for 

Space  utilization 

read  cost  (single  read  if  no  error) 

write  COSt  is  3x  (read  parity,  write  data.parity) 

compared  to  duplex  2x 

G.  Gibson,  R.  Katz,  D.  Patterson,  A  Case  for  Redundant  Arrays  of  Inexpensive 
Discs,  (RAID),  SIGMOD  88. 

M.  Kim,  Synchronized  Discs  Interleaving,  IEEE  TOC,  V.  C35  #11,  Nov  1986 

S.  Ng,  Design  Alternatives  for  Disc  Duplexing,  IBM  RJ  5481,  Jan  1987 

S.  Ng,  Lang,  D.,  Sellinger,  R.,  Tradeoffs  Between  Devices  and  Paths  In  achieving 
Disc  Interleving,  IBM  RJ  6140,  Mar  1988 

S.  Ng,  Some  Design  Issues  of  Disc  Arrays,  Compcon  89 

G.  Gibson,  Peter  Chen,  R.  Katz,  D.  Patterson,  Introduction  To  Redundant  Arrays 
of  Inexpensive  Discs  (RAID),  Compcon  89 

M.  Schulze,  G.  Gibson,  R.  Katz,  D.  Patterson,  How  Reliable  is  RAID?,  Compcon  89 
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